Classifying free-text triage chief complaints into syndromic categories with natural language processing
نویسندگان
چکیده
OBJECTIVE Develop and evaluate a natural language processing application for classifying chief complaints into syndromic categories for syndromic surveillance. INTRODUCTION Much of the input data for artificial intelligence applications in the medical field are free-text patient medical records, including dictated medical reports and triage chief complaints. To be useful for automated systems, the free-text must be translated into encoded form. METHODS We implemented a biosurveillance detection system from Pennsylvania to monitor the 2002 Winter Olympic Games. Because input data was in free-text format, we used a natural language processing text classifier to automatically classify free-text triage chief complaints into syndromic categories used by the biosurveillance system. The classifier was trained on 4700 chief complaints from Pennsylvania. We evaluated the ability of the classifier to classify free-text chief complaints into syndromic categories with a test set of 800 chief complaints from Utah. RESULTS The classifier produced the following areas under the ROC curve: Constitutional = 0.95; Gastrointestinal = 0.97; Hemorrhagic = 0.99; Neurological = 0.96; Rash = 1.0; Respiratory = 0.99; Other = 0.96. Using information stored in the system's semantic model, we extracted from the Respiratory classifications lower respiratory complaints and lower respiratory complaints with fever with a precision of 0.97 and 0.96, respectively. CONCLUSION Results suggest that a trainable natural language processing text classifier can accurately extract data from free-text chief complaints for biosurveillance.
منابع مشابه
Assessing the performance of American chief complaint classifiers on Victorian syndromic surveillance data
Syndromic surveillance systems aim to support early detection of salient disease outbreaks, and to shed timely light on the size and spread of pandemic outbreaks. They can also be used more generally to monitor disease trends and provide reassurance that an outbreak has not occurred. One commonly used technique for syndromic surveillance is concerned with classifying Emergency Department data, ...
متن کاملA Term-based Approach to Asyndromic Determination of Significant Case Clusters
Introduction Biosurveillance systems commonly depend on free-text chief complaints (CC)s for timely situational awareness. However, diagnosis codes may not be available soon enough and may have uncertain value because they are assigned for billing purposes rather than for population monitoring. Existing systems use syndrome categories to classify records based on these free-text fields. A syndr...
متن کاملMultilingual chief complaint classification for syndromic surveillance: An experiment with Chinese chief complaints
PURPOSE Syndromic surveillance is aimed at early detection of disease outbreaks. An important data source for syndromic surveillance is free-text chief complaints (CCs), which may be recorded in different languages. For automated syndromic surveillance, CCs must be classified into predefined syndromic categories to facilitate subsequent data aggregation and analysis. Despite the fact that syndr...
متن کاملEvaluation of preprocessing techniques for chief complaint classification
OBJECTIVE To determine whether preprocessing chief complaints before automatically classifying them into syndromic categories improves classification performance. METHODS We preprocessed chief complaints using two preprocessors (CCP and EMT-P) and evaluated whether classification performance increased for a probabilistic classifier (CoCo) or for a keyword-based classifier (modification of the...
متن کاملIdentifying ILI Cases from Chief Complaints: Comparing Keyword and Support Vector Machine Methods
The rapid spread of the novel H1N1 virus prompted Ottawa Public Health (OPH) to monitor Emergency Department Chief Complaints (EDCC) specifically for influenza-like illness (ILI). Note that data from ED visits is the most common data source for syndromic surveillance systems in the US [1]. METHODS Our data set was formed of 149910 case records composed of free text EDCC and accompanying patient...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Artificial intelligence in medicine
دوره 33 1 شماره
صفحات -
تاریخ انتشار 2005